\chapter{Experimental Setup}\label{experimental-setup}

In this section, we present the practical side of our experiment- in Section \ref{exp-hypothesis}, we state our hypothesis and the way in which we will test in using our method. In Section \ref{experiment}, we present the Virtual World that we used and introduce the Subjects of the experiment. In Section \ref{exp-phases}, we describe the Phases of our experiment and the utility of each phase.  

\section{Hypotheses}  \label{exp-hypothesis}

	Our hypothesis regarding overspecification is that it is a useful part of communication because it helps establish alignment between speakers for the duration of the entire communication process. This, applied to lexical acquisition, aids the acquisition of new lexemes since it facilitates the establishment of connections between L1 and L2 words and concepts in a long-term manner. In order to test our hypothesis, we use a virtual environment setting and an instruction-giving system to guide subjects in the world and permit them to acquire new vocabulary words. By comparing subjects who practiced new words with different degrees of overspecification, we aim to observe the utility of overspecification for L2 lexical acquisition, since we see it as an instance of lexical alignment.


	We will compare three groups of subjects, one receiving only minimal specification references, the second only overspecified references, and the third receiving overspecification only in cases of hesitation. By giving more information than is strictly needed to identify the objects in the virtual world, we give the subjects more chances to resolve the REs and acquire the new vocabulary; however, by giving the overspecified part of the RE only in case of hesitation, we mimic the multi-utterance reference phenomenon. Our hypothesis for this experiment is that giving overspecified REs during practice exercises will aid lexical acquisition, and giving the extra information in case of hesitation will reduce resolution times compared to constant overspecification. 

An implication of our experiment is with regards to Reference Domain Theory, and especially the presence of partitioning in reference via a multi-utterance phenomenon. In the condition of our experiment where subjects received overspecified references in two `fragments' (following hesitation on their part), we model the multi-utterance nature of reference. If these fragments make it easier for subjects to resolve REs, for example by reducing the subjects' resolution time or improving their success rate, we can conclude that partitioning exists and that by fragmenting the references, we aid the phenomenon to take place. If, on the other hand, we find that there is no significant difference between the `pure' overspecification and `fragmented' specification conditions, we can say that partitioning is not aided by multi-utterance reference production, and that its existence was not directly confirmed by our experiment.

	We modeled our experiment in the context of a 3-dimensional virtual environment. The advantages of using a virtual environment are numerous- for instance, it is easy to build and modify the world and tailor it according to our preferences; also, it had an added element of interactivity to language learning, as opposed to flashcards or 2D computer images. Finally, it gave us an array of useful information, for example logging objects visible to the user on the screen, which is similar to eye tracking, as well as measuring the speed of movement of the user and their actions in the virtual world. We exploited this information to analyze the speed and direction of movement of the user, as well as the amount of mistakes that they made while carrying out the experiment. Using an instruction-giving system anchored in the virtual world assured that all subjects went through exactly the same experience, modified only by their own behavior. 

%In guiding the subjects through a series of language learning tasks and giving them practice exercises with different degrees of overspecification, we aim to prove the utility of overspecification for establishing alignment between speakers which, applied to the context of language acquisition, facilitates lexeme acquisition. 


\section{Experimental Setup} \label{experiment}

Our experiment involved using an instruction-giving system to guide subjects through a series of language-learning and practice tasks in a virtual world. We recruited 75 native Spanish-speaking subjects and taught them a total of 9 words in Russian, during a series of phases in the context of a 3D virtual ``game". We controlled factors of potential bias such as age, sex, profession, and video game familiarity, and asked for the subjects' subjective opinion regarding the experiment after they had finished. 


\subsection{The Virtual World}

	We carried out an experiment in a virtual world specially designed to test our hypothesis. The virtual world was designed using the GIVE platform~\citep{Koller_Striegnitz_Gargett_Byron_Cassell_Dale_Moore_Oberlander_2008}. This platform includes a set of tools for designing virtual environments that can contain different virtual objects with distinguishing properties (such as color, shape, etc). 

\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{object.png} 
\caption{Screenshot of the 3D virtual environment used by the subjects. Navigation instructions were given in Spanish. \label{object-room}}
\end{figure}

	For the purpose of this experiment, we implemented an automated system which uses the GIVE platform to generate referring expressions to describe virtual world objects in Russian. The system helped the subjects navigate through the virtual world while teaching them a series of Russian words. The words they learned were from 3 categories: Taxonomical (3 names of objects), Absolute (4 colors) and Relative (2 sides), for a total of 9 words. While navigating through the world, they were prompted to press buttons and were provided with the names of corresponding objects in Russian. Instructions on how to interact with virtual objects and when to move through the phases of the experiment were given in Spanish. All referring expressions to objects the subject had to identify were given in Russian. 

	During experiment planning, the language in which instructions should be given came up as an issue to deal with, with several options: giving the instructions in Spanish (with just the names of the objects in Russian), giving them in Spanish and later incorporating Russian little by little, or else keeping only the tutorial in Spanish and reduce all the rest of the communication to the names of objects and relations in Russian. We decided to adhere to the strategy of giving instructions in Spanish and names of objects in Russian, for clarity and ease of comprehension. For example, even if we taught the subjects `left' and `right' in Russian, using the newly taught lexemes in order to give instructions regarding which direction they should go introduces a bias in the experiment, since subjects who acquired the words more quickly would have better performance. For this reason, all instructions except the actual REs were given in Spanish.

	Also, while the original GIVE experiments were carried out on-line, which broadens the spectrum and variety, as well as makes it easier to recruit subjects, we decided to carry out our experiment in a laboratory setting for various reasons. For instance, since all subjects used the same computer, we avoided bias due to graphic card limitations and CPU performance. Also, the main reason for carrying out the experiments in a controlled environment was to ensure that subjects did not get influenced during the task, for example by receiving external help or distraction. While web experiments have been proven to give similar results to laboratory testing~\citep{Koller_09}, we decided that the potential bias entailed by this type of testing was not worth the potential benefits. As well, since our experiments did not call for a very large number of subjects, doing it in a laboratory setting would best ensure the quality of the results.

\subsection{The Subjects}

We recruited 75 subjects that completed the experiment in an average time of 13.4 min each. 30\% of the participants were women and the rest were men. Most of them were university students from different majors (computer science, psychology, biology, etc), and the average age was 28. Before they started the experiment, we simply told them that they were going to go through a short language-learning 3D video game to learn some vocabulary in Russian. Of the 75 subjects, we made 3 equal groups: the MR (Minimal Reference) Group received vocabulary exercises with minimal REs regarding objects, meaning just enough information to successfully identify the target object; the OR (Overspecified Reference) Group received exercises with overspecified REs, and the FR (Fragmented Reference) Group initially received the minimal RE and the overspecified RE only when they showed signs of hesitation in identifying the referent. 

Before and after the task, we gave the subjects questionnaires (see Appendix~\ref{prequest,postquest}). In the pre-experiment questionnaire, we collected demographical data such as age, sex, profession, whether they were native Spanish speakers or not, if they spoke Russian (to avoid bias regarding misunderstood instructions and/or previous knowledge of Russian), and their familiarity with video games. In the post-questionnaire, we asked quality control questions, including those regarding the subjects's opinion of the experiment. For instance, we asked subjects whether they had any trouble understanding the instructions, if they thought that they had sufficient time to resolve the references given, and asked for personal suggestions or comments regarding the general functioning of the experiment. These questions were to improve future experiments and to know where there were flaws in the planning of our experiment. We also asked subjects personal opinion questions, such as if they considered that there were too many new Russian words and whether they thought that the Exercise room helped them remember the words correctly. The responses given for this last question were meant to be an indicator if subjects that received Overspecified REs during the Exercise phase felt that the references given were too confusing or cognitively demanding, as has been found in previous experiments~\citep{Engelhardt_Bailey_Ferreira_2006,Engelhardt_Bar_11}.


While we previously planned to make English the target language, we later changed it to Russian to avoid potential bias. Since English is taught in grade school in Argentina, their English levels could be very variable. Even despite the questionnaire at the beginning of the test, people could over- or under- assess their knowledge of English, and thereby affect the experiment results. For these reasons, we chose Russian, which is a much less common language in Argentina and would entail a “clean slate” learning experience. Indeed, none of the 75 total subjects had any previous knowledge of Russian in the pre-questionnaire. 

As well, the Russian taught was slightly modified to make it simpler for the subjects. For instance, since our subjects could not read the Cyrillic alphabet (and teaching them the alphabet would make the experiment much more complicated), we transliterated the REs in Russian using the ISO 9:1995 standard Romanization of Cyrillic, with the phonetic concession to Spanish speakers of replacing `y' with `i' and `zh' with `z'. Also, Russian has 6 cases, and therefore saying \emph{``the button"}, ``press \emph{the button}" and ``the chair to the right of \emph{the button}" would all result in different cases of the word ``button" (Nominal, Accusative, and Instrumental cases, respectively). We decided not to use Russian grammatical cases in the experiment due to its limited duration and to avoid confusion. We gave the names of objects in Russian in the instructions (and implying the pressing action, since it is taught in the tutorial) and, when giving the position of objects, to create slightly agrammatical sentences by using the Nominative (standard) instead of the Instrumental case: *\emph{knopka sprava ot stul} instead of \emph{knopka sprava ot stula}. We deemed this an acceptable concession since the goal of our experiment is not to teach our subjects perfectly grammatical Russian, but instead to see how effective our teaching strategy is to incite lexical acquisition.
 

\section{Phases of the Experiment} \label{exp-phases}

	The subject's first contact with the virtual world was in the \emph{Tutorial Phase}: they learned how to navigate the world and the conventions of the experiment. We explained that identifying an object was done by pressing the button on its right. 

Next came the \emph{Priming Phase} (which includes the Object, Color and Spatial Relation Rooms), where the subjects were primed with the Russian vocabulary (colors, objects, and orientations). Figure~\ref{object-room} shows the view that a subject perceived at the beginning of the Learning Phase of the experiment. 
During the \emph{Learning Phase}, we asked the subjects to press the buttons beside the objects in a room and gave them the name of the object in Russian. Each word stayed visible for 5 seconds and could only be received once (subsequent pressings did not give the word again). No specific order of the objects was enforced. The door to the next room only opened when all buttons in the current room were pressed. 

\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{minspec.png} 
\caption{Exercise Phase:  Referring expression received by a subject in the MR condition: \emph{zoltii stul} means `yellow chair'. \label{minimal}}
\end{figure}

The \emph{First Test Phase} evaluated how well the subjects remembered the vocabulary primed in the Priming Phase. Subjects were given a word in Russian and asked to press the button corresponding to that object or property, for a total of 7 objects. The REs produced in this phase were minimal for both the MR group and the OR group. In order to resolve them, the subjects needed to remember all the new words.

\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{overspec.png} 
\caption{Exercise Phase:  Referring expression received by a subject in the OR condition \emph{zoltii stul sleva ot krasnii svet} means `yellow chair on the left of the red light'. \label{overspecification}}
\end{figure}

In the \emph{Exercise Phase} subjects were given exercises to practice identifying complex referential expressions (combinations of the 4 colors and 3 objects, for a total of 12 objects). The REs given differed for the three groups of subjects. 
The MR subjects received an Absolute property (color) and a Taxonomical property (object), e.g. \emph{zoltii stul} (``yellow chair" in Russian). 
The OR subjects received an Absolute property, a Taxonomical property and a Relative property (location of the object with regards to a neighboring object), e.g. \emph{zoltii stul sleva ot krasnii svet} (``yellow chair on the left of the red light"). 
The FR subjects initially received an RE identical to that of MR subjects (that is, one with Absolute and Taxonomical properties); however, if the subjects displayed signs of hesitation and were unsure of which object to select, they were provided with the overspecified information, for a resulting RE identical to that of OR subjects.

Figure~\ref{minimal} shows a RE as received by a subject in the MR condition. In this situation, the target referent, the red plant, was visible to the subject. Figure~\ref{overspecification} shows a RE as received by a subject in the OR condition. The RE in Figure~\ref{overspecification} is overspecified since there was only one red plant in the room. In the moment the screenshot was captured, the target referent was not visible to the subject. The REs in both figures refer to the same object in the virtual world; however, Figure~\ref{minimal} gives only the minimal semantic content needed needed to find the object, whereas Figure~\ref{overspecification} gives an overspecified description.  We decided to overspecify the REs---such as \emph{yellow chair}---with a relation to a neighboring object---\emph{to the left of the red light}---since there are case studies that show that this is the preferred property that is most frequently overspecified in corpora~\citep{Viethen_2008,Pechmann_1989}. Pechmann \citeyearpar{Pechmann_1989} suggested that the reason for is that in order to interpret absolute properties, the speaker only has to look at the object itself, whereas relative specifications entail comparison with other objects. 

The \emph{Second Test Phase}, with an identical procedure to the first one, existed in order to be able to compare the subjects' relative improvement after the Exercise Phase. See Appendix \ref~{map_pics} for Screenshots of the phases. 

	For the FR subject group, we wanted not only to `fragment' the overspecified RE in order to be easier to process, but also to only give the second part only when the user demonstrated \emph{hesitation}. While in real-life conversation, hesitation can be observed via verbal cues such as a longer production time, spontaneous sounds like \emph{`mmm'}, or physical cues such as pausing or indecisiveness, in a virtual environment these cues are harder to observe. We decided to give the second, overspecified, part of the reference only if \emph{the subject started moving away from the referent}. If subjects moved directly toward the correct referent, there was no external evidence of hesitation and so overspecification was not triggered. However, if the subject started walking in another direction, the second part of the RE was given. This was meant to simulate the multi-utterance reference process that occurs spontaneously when REs are given. For example, a speaker may start by referring to an object with \emph{``the blue book"}, then, seeing that the person he is speaking to is hesitating, follow up with \emph{``on the wooden bookshelf"}, in order to point him in the right direction. While we are aware that this is not an exact model of hesitation, we believe that it is the most realistic and coherent way to model it given the setup of the GIVE virtual world and the experiment.

	We believe that the three conditions that we have chosen for the three groups of our experiment are pertinent for modeling real-world reference-giving situations. On one hand, we study the difference between minimal (MR) and overspecified (OR) references regarding the same objects and in the same practice situations. On the other hand, we compare situations in which the reference is fragmented in order to mimic the partitioning phenomenon proposed by the Reference Domain Theory. We will compare the FR condition with both the MR condition and the OR condition to see if giving multi-utterance references is more effective than giving just the minimum needed for identification, and also if it is more effective than giving several of an object's properties at once. In order to make these comparisons, we will look at several metrics from the results we collected, including: the rate at which subjects acquired vocabulary, the number of errors they reduced between the two test phrases, their success rate and resolution speed, as well as the subjective evaluations that they gave to various aspects of the experiment. 

